Skip to content

Race batch PDLP and dual simplex in strong branching / reliability branching#994

Merged
rapids-bot[bot] merged 80 commits intorelease/26.04from
race_strong_branching_realibility_branching
Apr 3, 2026
Merged

Race batch PDLP and dual simplex in strong branching / reliability branching#994
rapids-bot[bot] merged 80 commits intorelease/26.04from
race_strong_branching_realibility_branching

Conversation

@Kh4ster
Copy link
Copy Markdown
Contributor

@Kh4ster Kh4ster commented Mar 25, 2026

This PR enables the following regarding batch PDLP:

  • Enable batch PDLP in reliability branching
  • Add work stealing so that batch PDLP and Dual Simplex can run concurrently and steal LPs from each other if one solves it first
  • Use correct problem representation with cuts for batch PDLP
  • Use a PDLP warm start cache across strong branching at the root and in reliability branching
  • Increase tolerance on batch PDLP to have higher quality solution
  • Increase iteration limit to allow instances that needs a high iteration count (with low cost per iteration) to still come through (only while solving the original LP to get warm start data)
  • Multiple heuristics to not run batch PDLP to not create overheads when Dual Simplex is clearly superior
  • Don't store and copy primal dual solution unless need it to save on memory
  • Handle batch PDLP errors better, allowing Dual Simplex to still continue in strong branching even if BPDLP fails
  • No early exit if the initial warm start PDLP solution is already feasible in BPDLP
  • Correct objective for BPDLP when there is an offset

Currently we still keep BPDLP off by default both at the root and in reliability branching

Kh4ster and others added 30 commits February 9, 2026 17:47
…te_lp_problem' into race_strong_branching_realibility_branching
…ng, correctly fill the ds_obj objective before merging results at the root, correctly clamp the PDLP objective, remove the unucessary cuopt_assert regarding fixed point error
@Kh4ster
Copy link
Copy Markdown
Contributor Author

Kh4ster commented Apr 3, 2026

/ok to test 0d5ae1c

@Kh4ster
Copy link
Copy Markdown
Contributor Author

Kh4ster commented Apr 3, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 3, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@Kh4ster
Copy link
Copy Markdown
Contributor Author

Kh4ster commented Apr 3, 2026

/ok to test 7504340

@Kh4ster
Copy link
Copy Markdown
Contributor Author

Kh4ster commented Apr 3, 2026

/ok to test a7f79fa

@Kh4ster
Copy link
Copy Markdown
Contributor Author

Kh4ster commented Apr 3, 2026

/ok to test dd07576

@Kh4ster
Copy link
Copy Markdown
Contributor Author

Kh4ster commented Apr 3, 2026

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 3, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
cpp/src/branch_and_bound/pseudo_costs.cpp (1)

547-584: ⚠️ Potential issue | 🟠 Major

Assert or remap the slack-column suffix invariant.

remove_columns(cols_to_remove) treats new_slacks as arbitrary column indices, but Lines 560-584 still feed the first n entries of root_soln, objective, lower, and upper into PDLP. That only stays aligned if every slack index already lives in [n, lp.num_cols). Otherwise the compacted matrix, the vectors, and the branching indices no longer describe the same columns.

Minimal guard if appended slacks are the intended invariant
   cuopt::mps_parser::mps_data_model_t<i_t, f_t> mps_model;
   int m = lp.num_rows;
   int n = lp.num_cols - new_slacks.size();
   original_root_soln_x.resize(n);
+  for (i_t slack : new_slacks) {
+    assert(slack >= n &&
+           "simplex_problem_to_mps_data_model assumes slack columns are appended");
+  }

If that invariant is not guaranteed, this helper needs a real old→new column map and the callers need to remap the candidate indices as well.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/branch_and_bound/pseudo_costs.cpp` around lines 547 - 584, The code
removes columns via A_no_slacks.remove_columns(cols_to_remove) but still blindly
slices the first n entries of root_soln, lp.objective, lp.lower, lp.upper into
PDLP (original_root_soln_x and mps_model.set_* calls), which is only correct if
all slack indices in new_slacks are appended (>= n); add an explicit invariant
check or build a proper old→new column map: either assert all new_slacks[j] >= n
before proceeding (and document the invariant), or construct a mapping from
original column index → compacted index after remove_columns and use it to remap
original_root_soln_x, the objective vector, lower/upper bound arrays, and any
branching candidate indices so the CSR matrix and these vectors stay aligned
(refer to cols_to_remove, A_no_slacks, original_root_soln_x,
mps_model.set_objective_coefficients, set_variable_lower_bounds,
set_variable_upper_bounds).
cpp/src/pdlp/pdlp.cu (1)

801-864: ⚠️ Potential issue | 🟠 Major

Mark the last resized-batch climbers as solved before this return.

Unlike Line 865, this early-return branch never calls sb_view_.mark_solved(...) for the climbers still left in the batch. Any Dual Simplex task that has not started those branches yet will keep solving them even though PDLP already finished them, so the final chunk cannot be stolen/skipped.

Suggested fix
     if (original_batch_size != climber_strategies_.size()) {
 `#ifdef` BATCH_VERBOSE_MODE
       std::cout << "Original batch size was " << original_batch_size << " but is now "
                 << climber_strategies_.size() << std::endl;
 `#endif`
@@
         if (current_termination_strategy_.get_termination_status(i) !=
             pdlp_termination_status_t::ConcurrentLimit) {
           batch_solution_to_return_
             .get_additional_termination_informations()[climber_strategies_[i].original_index]
             .solved_by = method_t::PDLP;
         }
       }
+      if (sb_view_.is_valid()) {
+        for (size_t i = 0; i < climber_strategies_.size(); ++i) {
+          sb_view_.mark_solved(climber_strategies_[i].original_index);
+        }
+      }
       current_termination_strategy_.fill_gpu_terms_stats(total_pdlp_iterations_);
       RAFT_CUDA_TRY(cudaStreamSynchronize(stream_view_));
       current_termination_strategy_.convert_gpu_terms_stats_to_host(
         batch_solution_to_return_.get_additional_termination_informations());
       return optimization_problem_solution_t<i_t, f_t>{
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/pdlp/pdlp.cu` around lines 801 - 864, This early-return path fails to
notify the solver-bookkeeping that the remaining resized-batch climbers are
finished; before returning, iterate over
current_termination_strategy_.get_terminations_status().size() and call
sb_view_.mark_solved(...) for each climber using
climber_strategies_[i].original_index (and the appropriate termination/solution
metadata you have just copied into batch_solution_to_return_), mirroring the
behavior used at the later return (line 865) so in-progress Dual Simplex tasks
can stop/skip those chunks.
♻️ Duplicate comments (6)
cpp/src/branch_and_bound/pseudo_costs.hpp (1)

416-425: ⚠️ Potential issue | 🟠 Major

Invalidate the PDLP warm-start cache when the LP context changes.

Once populated flips true, this struct has no reset path, so reusing the same pseudo_costs_t after cuts/slacks/original-LP changes can replay stale warm-start vectors and scalars into a different batch LP. Add a reset() helper here and call it from the context-reset path.

Suggested fix
 struct batch_pdlp_warm_cache_t {
   const raft::handle_t batch_pdlp_handle{};
   rmm::device_uvector<f_t> initial_primal{0, batch_pdlp_handle.get_stream()};
   rmm::device_uvector<f_t> initial_dual{0, batch_pdlp_handle.get_stream()};
   f_t step_size{std::numeric_limits<f_t>::signaling_NaN()};
   f_t primal_weight{std::numeric_limits<f_t>::signaling_NaN()};
   i_t pdlp_iteration{-1};
   f_t percent_solved_by_batch_pdlp_at_root{f_t(0.0)};
   bool populated{false};
+
+  void reset()
+  {
+    initial_primal.resize(0, batch_pdlp_handle.get_stream());
+    initial_dual.resize(0, batch_pdlp_handle.get_stream());
+    step_size                           = std::numeric_limits<f_t>::signaling_NaN();
+    primal_weight                       = std::numeric_limits<f_t>::signaling_NaN();
+    pdlp_iteration                      = -1;
+    percent_solved_by_batch_pdlp_at_root = f_t(0.0);
+    populated                           = false;
+  }
 };

As per coding guidelines "Validate correct initialization of variable bounds, constraint coefficients, and algorithm state before solving; ensure reset when transitioning between algorithm phases (presolve, simplex, diving, crossover)".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/branch_and_bound/pseudo_costs.hpp` around lines 416 - 425, The
batch_pdlp_warm_cache_t struct can retain stale warm-start data after the LP
context changes because there is no reset path; add a member method reset() on
batch_pdlp_warm_cache_t that clears initial_primal and initial_dual
(release/resize to 0 on batch_pdlp_handle.get_stream()), sets step_size and
primal_weight back to NaN, pdlp_iteration to -1,
percent_solved_by_batch_pdlp_at_root to f_t(0.0) and populated to false, and
then call this reset() from the pseudo_costs_t code path that
reinitializes/changes the LP context (where the context-reset currently occurs)
so any time cuts/slacks/original-LP are changed the warm cache is invalidated.
cpp/src/pdlp/solve.cu (2)

1049-1057: ⚠️ Potential issue | 🟠 Major

Validate shared_sb_solved before taking subspans.

The only guard here is !empty(). If the caller passes fewer solved flags than new_bounds.size(), subspan(i, current_batch_size) walks past the end of the coordination array and corrupts cooperative termination state. Check the span length once before the loop.

Suggested fix
+  cuopt_assert(settings.shared_sb_solved.empty() ||
+                 settings.shared_sb_solved.size() >= max_batch_size,
+               "shared_sb_solved must cover every batch entry");
   for (size_t i = 0; i < max_batch_size; i += optimal_batch_size) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/pdlp/solve.cu` around lines 1049 - 1057, The code only checks
settings.shared_sb_solved.empty() but not its length relative to new_bounds, so
subspan(i, current_batch_size) can out-of-bounds; before the batching loop
validate that when settings.shared_sb_solved is non-empty its size() (or
extent()) is >= max_batch_size (or original_new_bounds.size()), and if not,
either trim max_batch_size/current_batch_size accordingly or return/error; then
inside the loop you can safely call settings.shared_sb_solved.subspan(i,
current_batch_size) to populate batch_settings.shared_sb_solved. Ensure you
reference settings.shared_sb_solved, original_new_bounds/new_bounds,
max_batch_size and subspan in the check and handle the shorter span case
consistently.

968-972: ⚠️ Potential issue | 🟠 Major

Don't let sub_batch_size bypass the memory ceiling.

memory_max_batch_size is computed specifically to avoid OOM, but a positive settings.sub_batch_size overwrites it unchecked. A caller can request a sub-batch larger than the memory-checked limit and reintroduce the allocation failure this code just avoided.

Suggested fix
   size_t optimal_batch_size = use_optimal_batch_size
                                 ? detail::optimal_batch_size_handler(problem, memory_max_batch_size)
                                 : max_batch_size;
-  if (settings.sub_batch_size > 0) { optimal_batch_size = settings.sub_batch_size; }
+  if (settings.sub_batch_size > 0) {
+    const auto requested_sub_batch_size = static_cast<size_t>(settings.sub_batch_size);
+    cuopt_assert(requested_sub_batch_size <= memory_max_batch_size,
+                 "sub_batch_size exceeds the memory-checked batch size");
+    optimal_batch_size = requested_sub_batch_size;
+  }

As per coding guidelines "Verify correct problem size checks before expensive GPU/CPU operations; prevent resource exhaustion on oversized problems".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/pdlp/solve.cu` around lines 968 - 972, The current logic lets
settings.sub_batch_size override the memory-checked limit, which can reintroduce
OOMs; instead, when computing optimal_batch_size in solve.cu (symbols:
optimal_batch_size, settings.sub_batch_size, memory_max_batch_size,
detail::optimal_batch_size_handler, max_batch_size, cuopt_assert), clamp
settings.sub_batch_size to the memory ceiling (e.g., use
min(settings.sub_batch_size, memory_max_batch_size) or validate and reject
values > memory_max_batch_size) so the final optimal_batch_size never exceeds
memory_max_batch_size (and still respects max_batch_size), then keep the
cuopt_assert as a final sanity check.
cpp/include/cuopt/linear_programming/pdlp/solver_settings.hpp (1)

303-305: ⚠️ Potential issue | 🔴 Critical

Keep batch solution collection opt-out, not opt-in.

With this default at false, generic batch LP solves now return empty primal/dual/reduced-cost buffers unless every caller remembers to opt in. Existing batch paths in cpp/tests/linear_programming/pdlp_test.cu still slice those buffers unconditionally, so this regresses the non-MIP batch API.

Suggested fix
-  // By default to save memory and speed we don't store and copy each climber's primal and dual
-  // solutions We only retrieve termination statistics and the objective values
-  bool generate_batch_primal_dual_solution{false};
+  // Preserve the historical batch behavior and collect per-climber solutions unless callers opt out.
+  bool generate_batch_primal_dual_solution{true};
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/include/cuopt/linear_programming/pdlp/solver_settings.hpp` around lines
303 - 305, Defaulting generate_batch_primal_dual_solution to false makes batch
solves opt‑in and breaks callers that expect populated primal/dual/reduced-cost
buffers; change the bool generate_batch_primal_dual_solution default to true (so
batch solution collection is enabled by default/opt‑out) and update the adjacent
comment to reflect "opt-out" semantics so existing tests and code that slice
these buffers continue to work without modification.
cpp/src/branch_and_bound/shared_strong_branching_context.hpp (1)

38-56: ⚠️ Potential issue | 🟠 Major

Make the view helpers release-safe and overflow-safe.

These accessors still rely on assert, so a bad index becomes unchecked UB in release builds. subview() also computes offset + count in i_t before the cast, which can overflow; validate the casted sizes separately and use those converted indices for operator[]/subspan.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/branch_and_bound/shared_strong_branching_context.hpp` around lines 38
- 56, The accessors (is_solved, mark_solved, subview) use asserts and can
overflow in release builds; replace the assert-based checks with runtime bounds
validation and throw std::out_of_range on failure, and avoid signed arithmetic
overflow in subview by converting indices to size_t first. Concretely: in
is_solved(i_t local_idx) and mark_solved(i_t local_idx) cast local_idx to size_t
(e.g. size_t idx = static_cast<size_t>(local_idx)), verify local_idx >= 0 and
idx < solved.size(), and throw std::out_of_range with a clear message instead of
asserting; in subview(i_t offset, i_t count) cast both to size_t (size_t off =
static_cast<size_t>(offset), cnt = static_cast<size_t>(count)), ensure offset
and count are non-negative, check off <= solved.size() and cnt <= solved.size()
- off, then return {solved.subspan(off, cnt)} to avoid offset+count overflow.
cpp/tests/linear_programming/pdlp_test.cu (1)

2243-2254: ⚠️ Potential issue | 🟠 Major

Replace the fixed 200 ms sleep with deterministic synchronization.

On afiro-sized batches PDLP can finish before the delay. That makes these tests flaky and can skip the ConcurrentLimit path entirely. Use a latch/promise/barrier that the solver thread signals once the cooperative path is live, then apply the external marks.

Also applies to: 2309-2320

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tests/linear_programming/pdlp_test.cu` around lines 2243 - 2254, The
fixed sleep is flaky; replace it with a deterministic synchronization (e.g.,
std::promise/std::future or std::latch) so the main thread waits until the
solver thread actually enters the cooperative DS path. Create a promise/future
(or latch), pass the signal into the pdlp_thread lambda (or have solve_lp expose
a hook) and have the solver thread set the promise when the cooperative path is
live; then have the main thread wait on the future before calling
sb_view.mark_solved(i). Apply the same change to the other occurrence around the
second pdlp_thread usage.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/src/branch_and_bound/pseudo_costs.cpp`:
- Around line 1044-1054: When simplex_iteration_limit < 1 the code calls
initialize_pseudo_costs_with_estimate(...) which populates pc.strong_branch_down
and pc.strong_branch_up, but the function continues and later merges/overwrites
those estimates with UNSET/NaN values; to fix, add an early return (or
conditional skip of the merge/overwrite block) immediately after calling
initialize_pseudo_costs_with_estimate so the function exits with the
estimate-only pseudo-costs intact and does not proceed to the later
merge/overwrite logic before update_pseudo_costs_from_strong_branching().
- Around line 1587-1595: The current gate "if (rb_mode != 2)" skips the whole
taskloop when rb_mode==2 even if use_pdlp is false, so unreliable_list
candidates never get evaluated; change the condition to allow Dual Simplex
fallback when PDLP is disabled by using "if (rb_mode != 2 || !use_pdlp)" (or
logically equivalent) around the pragma/taskloop that references score_mutex,
sb_view, dual_simplex_obj_down/up, dual_simplex_status_down/up and
unreliable_list, and update any related log text so it correctly reflects that
DS will run when PDLP is unavailable.

In `@cpp/src/branch_and_bound/pseudo_costs.hpp`:
- Line 541: The pdlp_warm_cache member on pseudo_costs_t is mutable shared state
and can race when multiple workers perform reliability/strong branching; make
the cache worker-local or add synchronization: either move the
batch_pdlp_warm_cache_t<i_t, f_t> pdlp_warm_cache out of the shared
pseudo_costs_t (create a stack/local instance inside the branching worker
routine) or add a dedicated omp_mutex_t (e.g., pdlp_warm_cache_mutex) and wrap
every access/update to pdlp_warm_cache in a lock/unlock pair (similar to
existing omp_atomic_t/omp_mutex_t usage) to prevent cross-contamination of
device buffers, scalars, and populated state. Ensure all functions that read or
write pdlp_warm_cache (reliability/strong branching paths) use the new mutex or
the local per-worker instance.

In `@cpp/src/dual_simplex/simplex_solver_settings.hpp`:
- Around line 194-197: The member mip_batch_pdlp_reliability_branching is not
initialized in the class constructor, leaving it with indeterminate value;
update the constructor initializer list (next to
mip_batch_pdlp_strong_branching(0)) to include
mip_batch_pdlp_reliability_branching(0) so the default-constructed
SimplexSolverSettings/related class always sets this field to a defined value.

In `@cpp/src/mip_heuristics/CMakeLists.txt`:
- Line 17: Remove clique_table.cu from the MIP_LP_NECESSARY_FILES list and add
it to the MIP_NON_LP_FILES list in the CMakeLists.txt so that clique_table.cu
(which includes mip_heuristics/utils.cuh) is built only as MIP-non-LP code;
specifically, edit the lines that construct the MIP_LP_NECESSARY_FILES and
MIP_NON_LP_FILES variables to stop listing clique_table.cu in the former and
include it in the latter, and verify any targets or install/grouping logic that
consumes those variables still references the file from the new list.

In `@cpp/src/pdlp/pdlp.cuh`:
- Around line 142-143: The initialization of sb_view_
(dual_simplex::shared_strong_branching_context_view_t<i_t, f_t>) is passing
settings_.shared_sb_solved which is a cuda::std::span<std::atomic<int>> into a
constructor expecting std::span<std::atomic<int>>; explicitly construct a
std::span from the underlying data and size (use
settings_.shared_sb_solved.data() and settings_.shared_sb_solved.size()) so the
types match instead of relying on an implicit conversion between cuda::std::span
and std::span.

In `@cpp/src/pdlp/solve.cu`:
- Around line 897-903: The batch memory estimator always accounts for three full
output buffers but the actual allocations for the third (solution collection)
buffer are skipped when generate_batch_primal_dual_solution is false; update the
total_memory calculation in solve.cu so it conditionally charges the third
output buffer only when generate_batch_primal_dual_solution is true (adjust the
same logic around the corresponding block later in the function where
allocations are made; see references to total_memory, trial_batch_size,
problem.get_n_variables(), problem.get_n_constraints(), and
generate_batch_primal_dual_solution and apply the same fix for the duplicate
estimator at the other occurrence mentioned in the comment).

In `@cpp/src/pdlp/solver_settings.cu`:
- Around line 62-88: The bulk warm-start path in set_pdlp_warm_start_data is
writing initial_primal_weight and initial_step_size directly and bypasses the
numeric guards; update set_pdlp_warm_start_data to validate those values the
same way as set_initial_primal_weight and set_initial_step_size (either by
calling those two setters from set_pdlp_warm_start_data or by applying the same
cuopt_expects checks: >0, !isinf, !isnan) so the warm-start overload enforces
the same invariants before assigning initial_primal_weight_ and
initial_step_size_.

In `@cpp/src/pdlp/translate.hpp`:
- Line 12: Remove the unused CUDA device header include from translate.hpp:
delete the line including <mip_heuristics/problem/problem.cuh> because
translate.hpp only references detail::problem_t from
optimization_problem_interface.hpp and does not use any symbols from
problem.cuh; after removal, confirm translate.hpp still compiles standalone and
that no other code relies on that include being present.

---

Outside diff comments:
In `@cpp/src/branch_and_bound/pseudo_costs.cpp`:
- Around line 547-584: The code removes columns via
A_no_slacks.remove_columns(cols_to_remove) but still blindly slices the first n
entries of root_soln, lp.objective, lp.lower, lp.upper into PDLP
(original_root_soln_x and mps_model.set_* calls), which is only correct if all
slack indices in new_slacks are appended (>= n); add an explicit invariant check
or build a proper old→new column map: either assert all new_slacks[j] >= n
before proceeding (and document the invariant), or construct a mapping from
original column index → compacted index after remove_columns and use it to remap
original_root_soln_x, the objective vector, lower/upper bound arrays, and any
branching candidate indices so the CSR matrix and these vectors stay aligned
(refer to cols_to_remove, A_no_slacks, original_root_soln_x,
mps_model.set_objective_coefficients, set_variable_lower_bounds,
set_variable_upper_bounds).

In `@cpp/src/pdlp/pdlp.cu`:
- Around line 801-864: This early-return path fails to notify the
solver-bookkeeping that the remaining resized-batch climbers are finished;
before returning, iterate over
current_termination_strategy_.get_terminations_status().size() and call
sb_view_.mark_solved(...) for each climber using
climber_strategies_[i].original_index (and the appropriate termination/solution
metadata you have just copied into batch_solution_to_return_), mirroring the
behavior used at the later return (line 865) so in-progress Dual Simplex tasks
can stop/skip those chunks.

---

Duplicate comments:
In `@cpp/include/cuopt/linear_programming/pdlp/solver_settings.hpp`:
- Around line 303-305: Defaulting generate_batch_primal_dual_solution to false
makes batch solves opt‑in and breaks callers that expect populated
primal/dual/reduced-cost buffers; change the bool
generate_batch_primal_dual_solution default to true (so batch solution
collection is enabled by default/opt‑out) and update the adjacent comment to
reflect "opt-out" semantics so existing tests and code that slice these buffers
continue to work without modification.

In `@cpp/src/branch_and_bound/pseudo_costs.hpp`:
- Around line 416-425: The batch_pdlp_warm_cache_t struct can retain stale
warm-start data after the LP context changes because there is no reset path; add
a member method reset() on batch_pdlp_warm_cache_t that clears initial_primal
and initial_dual (release/resize to 0 on batch_pdlp_handle.get_stream()), sets
step_size and primal_weight back to NaN, pdlp_iteration to -1,
percent_solved_by_batch_pdlp_at_root to f_t(0.0) and populated to false, and
then call this reset() from the pseudo_costs_t code path that
reinitializes/changes the LP context (where the context-reset currently occurs)
so any time cuts/slacks/original-LP are changed the warm cache is invalidated.

In `@cpp/src/branch_and_bound/shared_strong_branching_context.hpp`:
- Around line 38-56: The accessors (is_solved, mark_solved, subview) use asserts
and can overflow in release builds; replace the assert-based checks with runtime
bounds validation and throw std::out_of_range on failure, and avoid signed
arithmetic overflow in subview by converting indices to size_t first.
Concretely: in is_solved(i_t local_idx) and mark_solved(i_t local_idx) cast
local_idx to size_t (e.g. size_t idx = static_cast<size_t>(local_idx)), verify
local_idx >= 0 and idx < solved.size(), and throw std::out_of_range with a clear
message instead of asserting; in subview(i_t offset, i_t count) cast both to
size_t (size_t off = static_cast<size_t>(offset), cnt =
static_cast<size_t>(count)), ensure offset and count are non-negative, check off
<= solved.size() and cnt <= solved.size() - off, then return
{solved.subspan(off, cnt)} to avoid offset+count overflow.

In `@cpp/src/pdlp/solve.cu`:
- Around line 1049-1057: The code only checks settings.shared_sb_solved.empty()
but not its length relative to new_bounds, so subspan(i, current_batch_size) can
out-of-bounds; before the batching loop validate that when
settings.shared_sb_solved is non-empty its size() (or extent()) is >=
max_batch_size (or original_new_bounds.size()), and if not, either trim
max_batch_size/current_batch_size accordingly or return/error; then inside the
loop you can safely call settings.shared_sb_solved.subspan(i,
current_batch_size) to populate batch_settings.shared_sb_solved. Ensure you
reference settings.shared_sb_solved, original_new_bounds/new_bounds,
max_batch_size and subspan in the check and handle the shorter span case
consistently.
- Around line 968-972: The current logic lets settings.sub_batch_size override
the memory-checked limit, which can reintroduce OOMs; instead, when computing
optimal_batch_size in solve.cu (symbols: optimal_batch_size,
settings.sub_batch_size, memory_max_batch_size,
detail::optimal_batch_size_handler, max_batch_size, cuopt_assert), clamp
settings.sub_batch_size to the memory ceiling (e.g., use
min(settings.sub_batch_size, memory_max_batch_size) or validate and reject
values > memory_max_batch_size) so the final optimal_batch_size never exceeds
memory_max_batch_size (and still respects max_batch_size), then keep the
cuopt_assert as a final sanity check.

In `@cpp/tests/linear_programming/pdlp_test.cu`:
- Around line 2243-2254: The fixed sleep is flaky; replace it with a
deterministic synchronization (e.g., std::promise/std::future or std::latch) so
the main thread waits until the solver thread actually enters the cooperative DS
path. Create a promise/future (or latch), pass the signal into the pdlp_thread
lambda (or have solve_lp expose a hook) and have the solver thread set the
promise when the cooperative path is live; then have the main thread wait on the
future before calling sb_view.mark_solved(i). Apply the same change to the other
occurrence around the second pdlp_thread usage.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 982cb111-beb8-4701-9cff-5285678a13b5

📥 Commits

Reviewing files that changed from the base of the PR and between 928a1e1 and dd07576.

📒 Files selected for processing (23)
  • cpp/include/cuopt/linear_programming/constants.h
  • cpp/include/cuopt/linear_programming/mip/solver_settings.hpp
  • cpp/include/cuopt/linear_programming/pdlp/solver_settings.hpp
  • cpp/src/branch_and_bound/branch_and_bound.cpp
  • cpp/src/branch_and_bound/pseudo_costs.cpp
  • cpp/src/branch_and_bound/pseudo_costs.hpp
  • cpp/src/branch_and_bound/shared_strong_branching_context.hpp
  • cpp/src/dual_simplex/simplex_solver_settings.hpp
  • cpp/src/math_optimization/solver_settings.cu
  • cpp/src/mip_heuristics/CMakeLists.txt
  • cpp/src/mip_heuristics/solver.cu
  • cpp/src/pdlp/pdlp.cu
  • cpp/src/pdlp/pdlp.cuh
  • cpp/src/pdlp/pdlp_constants.hpp
  • cpp/src/pdlp/restart_strategy/pdlp_restart_strategy.cu
  • cpp/src/pdlp/solve.cu
  • cpp/src/pdlp/solver_settings.cu
  • cpp/src/pdlp/termination_strategy/termination_strategy.cu
  • cpp/src/pdlp/termination_strategy/termination_strategy.hpp
  • cpp/src/pdlp/translate.hpp
  • cpp/src/pdlp/utilities/ping_pong_graph.cu
  • cpp/tests/linear_programming/pdlp_test.cu
  • python/cuopt_server/cuopt_server/utils/linear_programming/data_definition.py
💤 Files with no reviewable changes (1)
  • cpp/src/pdlp/pdlp_constants.hpp

omp_atomic_t<i_t> num_strong_branches_completed = 0;
omp_atomic_t<int64_t> strong_branching_lp_iter = 0;

batch_pdlp_warm_cache_t<i_t, f_t> pdlp_warm_cache;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't store the warm-start cache as unsynchronized shared worker state.

pseudo_costs_t already uses omp_atomic_t and omp_mutex_t for data that workers update concurrently, but pdlp_warm_cache is a plain mutable member. If two workers hit reliability/strong branching at once, they can race on the cached device buffers, scalars, and populated, cross-contaminating warm starts between nodes. Make the cache worker-local or guard it with a dedicated mutex.

As per coding guidelines "Ensure race conditions are absent in multi-GPU code and multi-threaded server implementations; verify proper synchronization of shared state".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/branch_and_bound/pseudo_costs.hpp` at line 541, The pdlp_warm_cache
member on pseudo_costs_t is mutable shared state and can race when multiple
workers perform reliability/strong branching; make the cache worker-local or add
synchronization: either move the batch_pdlp_warm_cache_t<i_t, f_t>
pdlp_warm_cache out of the shared pseudo_costs_t (create a stack/local instance
inside the branching worker routine) or add a dedicated omp_mutex_t (e.g.,
pdlp_warm_cache_mutex) and wrap every access/update to pdlp_warm_cache in a
lock/unlock pair (similar to existing omp_atomic_t/omp_mutex_t usage) to prevent
cross-contamination of device buffers, scalars, and populated state. Ensure all
functions that read or write pdlp_warm_cache (reliability/strong branching
paths) use the new mutex or the local per-worker instance.

Comment on lines 62 to 88
void pdlp_solver_settings_t<i_t, f_t>::set_initial_step_size(f_t initial_step_size)
{
cuopt_expects(initial_step_size > f_t(0),
error_type_t::ValidationError,
"Initial step size must be greater than 0");
cuopt_expects(!std::isinf(initial_step_size),
error_type_t::ValidationError,
"Initial step size must be finite");
cuopt_expects(!std::isnan(initial_step_size),
error_type_t::ValidationError,
"Initial step size must be a number");
initial_step_size_ = std::make_optional(initial_step_size);
}

template <typename i_t, typename f_t>
void pdlp_solver_settings_t<i_t, f_t>::set_initial_primal_weight(f_t initial_primal_weight)
{
cuopt_expects(initial_primal_weight > f_t(0),
error_type_t::ValidationError,
"Initial primal weight must be greater than 0");
cuopt_expects(!std::isinf(initial_primal_weight),
error_type_t::ValidationError,
"Initial primal weight must be finite");
cuopt_expects(!std::isnan(initial_primal_weight),
error_type_t::ValidationError,
"Initial primal weight must be a number");
initial_primal_weight_ = std::make_optional(initial_primal_weight);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

The bulk warm-start path still bypasses these new numeric guards.

set_pdlp_warm_start_data(...) later in this file writes the warm-start initial_primal_weight and initial_step_size directly at Lines 316-317, so callers using that overload can still inject <=0, inf, or nan values. Please reuse the same validation there so both entry points enforce the same solver-state invariant. As per coding guidelines: "Validate correct initialization of variable bounds, constraint coefficients, and algorithm state before solving; ensure reset when transitioning between algorithm phases (presolve, simplex, diving, crossover)".

🔧 Suggested direction
   cuopt_expects(last_restart_duality_gap_dual_solution != nullptr,
                 error_type_t::ValidationError,
                 "last_restart_duality_gap_dual_solution cannot be null");
+  cuopt_expects(initial_primal_weight > f_t(0) &&
+                  !std::isinf(initial_primal_weight) &&
+                  !std::isnan(initial_primal_weight),
+                error_type_t::ValidationError,
+                "Initial primal weight must be finite and greater than 0");
+  cuopt_expects(initial_step_size > f_t(0) &&
+                  !std::isinf(initial_step_size) &&
+                  !std::isnan(initial_step_size),
+                error_type_t::ValidationError,
+                "Initial step size must be finite and greater than 0");
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/pdlp/solver_settings.cu` around lines 62 - 88, The bulk warm-start
path in set_pdlp_warm_start_data is writing initial_primal_weight and
initial_step_size directly and bypasses the numeric guards; update
set_pdlp_warm_start_data to validate those values the same way as
set_initial_primal_weight and set_initial_step_size (either by calling those two
setters from set_pdlp_warm_start_data or by applying the same cuopt_expects
checks: >0, !isinf, !isnan) so the warm-start overload enforces the same
invariants before assigning initial_primal_weight_ and initial_step_size_.

@Kh4ster
Copy link
Copy Markdown
Contributor Author

Kh4ster commented Apr 3, 2026

/ok to test bb50ad1

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/src/pdlp/pdlp.cu`:
- Around line 780-798: Add a single upfront cuopt_assert before using
climber_strategies_[i].original_index to ensure the shared strong-branching span
matches the original batch width: when sb_view_.is_valid() is true, assert that
sb_view_ (or shared_sb_solved/span) size/width equals the original batch width
used to generate climber_strategies_ (so original_index is in-bounds for
shared_strong_branching_context_view_t). Place this check immediately inside the
if (sb_view_.is_valid()) block guarding the loop that accesses
climber_strategies_ (and add the same guard at the other occurrences noted
around the blocks using original_index at the locations corresponding to lines
~849-850, ~866-869, and ~928-929).
- Around line 789-790: When you set a climber to
pdlp_termination_status_t::ConcurrentLimit via
current_termination_strategy_.set_termination_status, do not serialize PDLP
iterates into that climber's result: either propagate the external solver's
result into the per-climber payload or skip filling primal/dual/reduced-cost
buffers, step metadata and the solved_by field for that climber. Locate the
blocks that read from pdhg_solver_/PDLP state and populate per-climber solution
data (the occurrences around the current call and the other ranges noted:
~818-849, 872-878, 897-928) and add a guard that if status ==
pdlp_termination_status_t::ConcurrentLimit then avoid populating the iterate
buffers and step metadata (or instead copy the externally-provided result
structure) and ensure solved_by is set consistently from the external-solver
branch.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 17941937-9cfb-4973-a40a-f3ce01b5a538

📥 Commits

Reviewing files that changed from the base of the PR and between dd07576 and bb50ad1.

📒 Files selected for processing (1)
  • cpp/src/pdlp/pdlp.cu

Comment on lines +780 to +798
if (sb_view_.is_valid()) {
for (size_t i = 0; i < climber_strategies_.size(); ++i) {
// If PDLP has solved it to optimality we want to keep it and resolved both solvers having
// solved the problem later
if (current_termination_strategy_.is_done(
current_termination_strategy_.get_termination_status(i)))
continue;
const i_t local_idx = climber_strategies_[i].original_index;
if (sb_view_.is_solved(local_idx)) {
current_termination_strategy_.set_termination_status(
i, pdlp_termination_status_t::ConcurrentLimit);
#ifdef BATCH_VERBOSE_MODE
std::cout << "[COOP SB] DS already solved climber " << i << " (original_index " << local_idx
<< "), synced to ConcurrentLimit at step " << internal_solver_iterations_
<< std::endl;
#endif
}
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate shared_sb_solved against the original batch size before using original_index.

sb_view_.is_valid() only proves the span is non-empty. These loads/stores use climber_strategies_[i].original_index, which stays in the original batch numbering after compaction. If the caller passes a shorter span, the assert inside shared_strong_branching_context_view_t disappears in release builds and this becomes an out-of-bounds atomic access. Please add a single upfront cuopt_assert that the shared span matches the original batch width.

Suggested guard
+if (sb_view_.is_valid()) {
+  cuopt_assert(sb_view_.solved.size() == settings_.new_bounds.size(),
+               "shared_sb_solved must match the original batch size");
+}
+
 if (sb_view_.is_valid()) {
   for (size_t i = 0; i < climber_strategies_.size(); ++i) {

As per coding guidelines, "Ensure variables and constraints are accessed from the correct problem context (original vs presolve vs folded vs postsolve); verify index mapping consistency across problem transformations".

Also applies to: 849-850, 866-869, 928-929

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/pdlp/pdlp.cu` around lines 780 - 798, Add a single upfront
cuopt_assert before using climber_strategies_[i].original_index to ensure the
shared strong-branching span matches the original batch width: when
sb_view_.is_valid() is true, assert that sb_view_ (or shared_sb_solved/span)
size/width equals the original batch width used to generate climber_strategies_
(so original_index is in-bounds for shared_strong_branching_context_view_t).
Place this check immediately inside the if (sb_view_.is_valid()) block guarding
the loop that accesses climber_strategies_ (and add the same guard at the other
occurrences noted around the blocks using original_index at the locations
corresponding to lines ~849-850, ~866-869, and ~928-929).

Comment on lines +789 to +790
current_termination_strategy_.set_termination_status(
i, pdlp_termination_status_t::ConcurrentLimit);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't serialize PDLP iterates as the result for ConcurrentLimit climbers.

Once a climber is flipped to pdlp_termination_status_t::ConcurrentLimit, this method still fills its primal/dual/reduced-cost buffers and step metadata from pdhg_solver_ / PDLP state. That means the status says “externally finished”, but the payload is just PDLP’s last iterate, and solved_by is only written on the PDLP branch. Please either plumb the external solver result through here or skip populating per-climber solution payloads for ConcurrentLimit.

As per coding guidelines, "Validate algorithm correctness in optimization logic: simplex pivots, branch-and-bound decisions, routing heuristics, and constraint/objective handling must produce correct results".

Also applies to: 818-849, 872-878, 897-928

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/pdlp/pdlp.cu` around lines 789 - 790, When you set a climber to
pdlp_termination_status_t::ConcurrentLimit via
current_termination_strategy_.set_termination_status, do not serialize PDLP
iterates into that climber's result: either propagate the external solver's
result into the per-climber payload or skip filling primal/dual/reduced-cost
buffers, step metadata and the solved_by field for that climber. Locate the
blocks that read from pdhg_solver_/PDLP state and populate per-climber solution
data (the occurrences around the current call and the other ranges noted:
~818-849, 872-878, 897-928) and add a guard that if status ==
pdlp_termination_status_t::ConcurrentLimit then avoid populating the iterate
buffers and step metadata (or instead copy the externally-provided result
structure) and ensure solved_by is set consistently from the external-solver
branch.

@Kh4ster
Copy link
Copy Markdown
Contributor Author

Kh4ster commented Apr 3, 2026

/ok to test 0a8b10a

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (4)
cpp/src/branch_and_bound/pseudo_costs.cpp (4)

1617-1624: Minor: Guard against division by near-zero in pseudo-cost update.

If leaf_solution.x[j] becomes very close to an integer due to numerical effects, change_in_x could be extremely small, producing very large pseudo-cost values that could distort branching decisions.

Optional defensive guard
+          constexpr f_t min_frac = 1e-8;
           f_t change_in_obj = std::max(obj - node_ptr->lower_bound, eps);
           f_t change_in_x   = leaf_solution.x[j] - std::floor(leaf_solution.x[j]);
-          pseudo_cost_sum_down[j] += change_in_obj / change_in_x;
+          if (change_in_x > min_frac) {
+            pseudo_cost_sum_down[j] += change_in_obj / change_in_x;
+          }
           pseudo_cost_num_down[j]++;

As per coding guidelines: "Check numerical stability... division by zero/near-zero".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/branch_and_bound/pseudo_costs.cpp` around lines 1617 - 1624, Guard
the pseudo-cost update against division by near-zero: compute change_in_x =
leaf_solution.x[j] - std::floor(leaf_solution.x[j]) and if fabs(change_in_x) is
below a small threshold (e.g. 1e-12) either skip the update or clamp change_in_x
to that threshold before doing pseudo_cost_sum_down[j] += (change_in_obj /
change_in_x) and incrementing pseudo_cost_num_down[j]; keep the existing
change_in_obj calculation (obj - node_ptr->lower_bound, eps) and leave the
rb_mode/is_dual_simplex_done / sb_view.mark_solved logic unchanged.

735-737: Redundant check: assertion makes the if-condition always true.

The assertion at line 735 guarantees !populated is true, making the if-check at line 737 redundant in debug builds. In release builds (where assert is stripped), if populated is unexpectedly true, the block would repopulate the cache which might cause issues.

Consider one of these approaches:

  1. Remove the if-check if the assertion is the invariant you want to enforce
  2. Remove the assertion and keep only the if-check with early return if already populated
  3. If both paths should be valid, add an else branch to handle the already-populated case
Option: Keep assertion, remove redundant if
   assert(!pc.pdlp_warm_cache.populated && "PDLP warm cache should not be populated at this point");

-  if (!pc.pdlp_warm_cache.populated) {
     pdlp_solver_settings_t<i_t, f_t> ws_settings;
     // ... rest of the block
-  }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/branch_and_bound/pseudo_costs.cpp` around lines 735 - 737, The assert
on pc.pdlp_warm_cache.populated makes the subsequent if-check redundant in debug
builds but unsafe in release builds; remove the assert line (the assertion
referencing pc.pdlp_warm_cache.populated) and keep the existing if
(!pc.pdlp_warm_cache.populated) block, adding an early return or skipping logic
when populated is true so the code behaves correctly in both debug and release
builds and avoids double-populating the PDLP warm cache.

614-628: Consider using a tolerance check for sigma instead of exact equality.

While sigma is structurally expected to be exactly ±1, the direct floating-point equality comparison violates the coding guideline about using epsilon comparisons. If matrix values are ever scaled or perturbed during preprocessing, this assertion could unexpectedly fail.

Suggested defensive check
-    if (sigma == -1) {
+    constexpr f_t sigma_tol = 1e-9;
+    if (std::abs(sigma - (-1.0)) < sigma_tol) {
       constraint_lower[i] = slack_lower + lp.rhs[i];
       constraint_upper[i] = slack_upper + lp.rhs[i];
-    } else if (sigma == 1) {
+    } else if (std::abs(sigma - 1.0) < sigma_tol) {
       constraint_lower[i] = -slack_upper + lp.rhs[i];
       constraint_upper[i] = -slack_lower + lp.rhs[i];
     } else {
-      assert(sigma == 1.0 || sigma == -1.0);
+      assert(false && "sigma must be +1 or -1");
     }

As per coding guidelines: "use epsilon comparisons for floating-point equality checks".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/branch_and_bound/pseudo_costs.cpp` around lines 614 - 628, The code
in pseudo_costs.cpp compares the floating value sigma to -1 and 1 using exact
equality; change this to an epsilon-based check (e.g., use fabs(sigma - 1.0) <
EPS and fabs(sigma + 1.0) < EPS with a small EPS like 1e-9 or a named constant)
in the branch that assigns constraint_lower/constraint_upper, and replace the
final assert(sigma == 1.0 || sigma == -1.0) with an assertion that one of the
epsilon checks passed (or handle the unexpected value by logging/error). Update
occurrences referencing sigma, slack_map, lp.A.x, constraint_lower and
constraint_upper accordingly.

1189-1212: Clarify merge behavior when source is NONE in mode 2.

At line 1192:

if (source == sb_source_t::PDLP || effective_batch_pdlp == 2) { sb_dest = value; }

In effective_batch_pdlp == 2 mode, this writes value unconditionally. When source == sb_source_t::NONE, value is NaN. Consider adding explicit handling:

Suggested clarification
-        if (source == sb_source_t::PDLP || effective_batch_pdlp == 2) { sb_dest = value; }
+        // In mode 2 (PDLP-only), always use PDLP result. In mode 1, use PDLP only when it wins.
+        // Note: if both failed, value is NaN which preserves the "unknown" state.
+        if (source == sb_source_t::PDLP || 
+            (effective_batch_pdlp == 2 && source != sb_source_t::NONE)) { 
+          sb_dest = value; 
+        }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/branch_and_bound/pseudo_costs.cpp` around lines 1189 - 1212, The code
unconditionally assigns sb_dest = value when effective_batch_pdlp == 2 even if
merge_sb_result returned source == sb_source_t::NONE (value is NaN); change the
assignment to only set sb_dest when the source is valid (e.g. source !=
sb_source_t::NONE) or when value is finite (use !std::isnan(value) /
std::isfinite), and ensure the merged_* counters and verbose logging (merged_nan
/ merged_from_pdlp / merged_from_ds and the "[COOP SB] Merge" message) handle
the NONE case consistently; update the conditional around sb_dest assignment
(the line involving effective_batch_pdlp and sb_source_t::PDLP) and any
subsequent branches that assume a valid value from merge_sb_result.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/src/branch_and_bound/pseudo_costs.cpp`:
- Around line 1044-1054: When simplex_iteration_limit < 1 the function calls
initialize_pseudo_costs_with_estimate(...) but then continues into the merge
logic which can overwrite valid estimates with NaN (pdlp_obj_down/up and
dual_simplex_status_* are unset); add an early return immediately after
initialize_pseudo_costs_with_estimate(...) to skip the subsequent
merge_sb_result/merge logic so pc.strong_branch_down and pc.strong_branch_up are
not overwritten (this prevents the branch that checks sb_source_t::PDLP or
effective_batch_pdlp == 2 from writing NaN back into the pseudo-costs).
- Around line 778-785: The async device copies into cache.initial_primal and
cache.initial_dual use ws_primal.stream()/ws_dual.stream() which may outlive
ws_solution and cause a use-after-free; fix by performing the copy on a stream
owned by the cache or by synchronizing the source stream before destroying
ws_solution—e.g., construct rmm::device_uvector using cache's stream (or call
cudaStreamSynchronize/ws_primal.stream() after the copy) so that the async
memcpy initiated by rmm::device_uvector (used in the pseudo_costs.cpp block
populating cache.initial_primal/initial_dual and setting cache.populated) cannot
access freed source memory.
- Around line 1543-1560: The OpenMP task/taskloop pragmas in
reliable_variable_selection are being emitted even when not inside a parallel
region; wrap the omp task and omp taskloop usage with runtime guards (e.g. check
omp_in_parallel()) or provide an alternative sequential path so the code behaves
correctly in single-threaded runs. Specifically, before emitting the "#pragma
omp task" that calls batch_pdlp_reliability_branching_task and before the
"#pragma omp taskloop" block in reliable_variable_selection, detect
omp_in_parallel() and only use the task/taskloop when true; otherwise call
batch_pdlp_reliability_branching_task (and the taskloop body) directly in a loop
on the calling thread to preserve semantics. Ensure checks are adjacent to the
existing pragmas and reference reliable_variable_selection,
batch_pdlp_reliability_branching_task, and the taskloop block so the change is
obvious and localized.

---

Nitpick comments:
In `@cpp/src/branch_and_bound/pseudo_costs.cpp`:
- Around line 1617-1624: Guard the pseudo-cost update against division by
near-zero: compute change_in_x = leaf_solution.x[j] -
std::floor(leaf_solution.x[j]) and if fabs(change_in_x) is below a small
threshold (e.g. 1e-12) either skip the update or clamp change_in_x to that
threshold before doing pseudo_cost_sum_down[j] += (change_in_obj / change_in_x)
and incrementing pseudo_cost_num_down[j]; keep the existing change_in_obj
calculation (obj - node_ptr->lower_bound, eps) and leave the
rb_mode/is_dual_simplex_done / sb_view.mark_solved logic unchanged.
- Around line 735-737: The assert on pc.pdlp_warm_cache.populated makes the
subsequent if-check redundant in debug builds but unsafe in release builds;
remove the assert line (the assertion referencing pc.pdlp_warm_cache.populated)
and keep the existing if (!pc.pdlp_warm_cache.populated) block, adding an early
return or skipping logic when populated is true so the code behaves correctly in
both debug and release builds and avoids double-populating the PDLP warm cache.
- Around line 614-628: The code in pseudo_costs.cpp compares the floating value
sigma to -1 and 1 using exact equality; change this to an epsilon-based check
(e.g., use fabs(sigma - 1.0) < EPS and fabs(sigma + 1.0) < EPS with a small EPS
like 1e-9 or a named constant) in the branch that assigns
constraint_lower/constraint_upper, and replace the final assert(sigma == 1.0 ||
sigma == -1.0) with an assertion that one of the epsilon checks passed (or
handle the unexpected value by logging/error). Update occurrences referencing
sigma, slack_map, lp.A.x, constraint_lower and constraint_upper accordingly.
- Around line 1189-1212: The code unconditionally assigns sb_dest = value when
effective_batch_pdlp == 2 even if merge_sb_result returned source ==
sb_source_t::NONE (value is NaN); change the assignment to only set sb_dest when
the source is valid (e.g. source != sb_source_t::NONE) or when value is finite
(use !std::isnan(value) / std::isfinite), and ensure the merged_* counters and
verbose logging (merged_nan / merged_from_pdlp / merged_from_ds and the "[COOP
SB] Merge" message) handle the NONE case consistently; update the conditional
around sb_dest assignment (the line involving effective_batch_pdlp and
sb_source_t::PDLP) and any subsequent branches that assume a valid value from
merge_sb_result.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e6684700-8d0e-4e59-ac7e-f0676d2019c2

📥 Commits

Reviewing files that changed from the base of the PR and between bb50ad1 and 0a8b10a.

📒 Files selected for processing (3)
  • cpp/src/branch_and_bound/pseudo_costs.cpp
  • cpp/src/dual_simplex/simplex_solver_settings.hpp
  • cpp/src/pdlp/solve.cu
🚧 Files skipped from review as they are similar to previous changes (2)
  • cpp/src/dual_simplex/simplex_solver_settings.hpp
  • cpp/src/pdlp/solve.cu

Comment on lines +778 to +785
// Need to use the pc steam since the batch pdlp handle will get destroyed after the warm
// start
cache.initial_primal = rmm::device_uvector<f_t>(ws_primal, ws_primal.stream());
cache.initial_dual = rmm::device_uvector<f_t>(ws_dual, ws_dual.stream());
cache.step_size = ws_solution.get_pdlp_warm_start_data().initial_step_size_;
cache.primal_weight = ws_solution.get_pdlp_warm_start_data().initial_primal_weight_;
cache.pdlp_iteration = ws_solution.get_pdlp_warm_start_data().total_pdlp_iterations_;
cache.populated = true;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Potential use-after-free: async device copy may not complete before source is destroyed.

The comment mentions needing to use a specific stream, but the code uses ws_primal.stream() and ws_dual.stream() from ws_solution. The rmm::device_uvector copy constructor performs an async memcpy on the given stream. If ws_solution (and its underlying device memory) is destroyed before the async copy completes, this could cause undefined behavior.

Suggested fix: use the cache's stream or synchronize
       // Need to use the pc stream since the batch pdlp handle will get destroyed after the warm
       // start
-      cache.initial_primal = rmm::device_uvector<f_t>(ws_primal, ws_primal.stream());
-      cache.initial_dual   = rmm::device_uvector<f_t>(ws_dual, ws_dual.stream());
+      auto cache_stream = cache.batch_pdlp_handle.get_stream();
+      cache.initial_primal = rmm::device_uvector<f_t>(ws_primal, cache_stream);
+      cache.initial_dual   = rmm::device_uvector<f_t>(ws_dual, cache_stream);
+      // Ensure copy completes before ws_solution goes out of scope
+      cudaStreamSynchronize(cache_stream);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/branch_and_bound/pseudo_costs.cpp` around lines 778 - 785, The async
device copies into cache.initial_primal and cache.initial_dual use
ws_primal.stream()/ws_dual.stream() which may outlive ws_solution and cause a
use-after-free; fix by performing the copy on a stream owned by the cache or by
synchronizing the source stream before destroying ws_solution—e.g., construct
rmm::device_uvector using cache's stream (or call
cudaStreamSynchronize/ws_primal.stream() after the copy) so that the async
memcpy initiated by rmm::device_uvector (used in the pseudo_costs.cpp block
populating cache.initial_primal/initial_dual and setting cache.populated) cannot
access freed source memory.

Comment on lines +1044 to +1054
if (simplex_iteration_limit < 1) {
initialize_pseudo_costs_with_estimate(original_lp,
settings,
root_vstatus,
root_solution,
basic_list,
nonbasic_list,
fractional,
basis_factors,
pc);
} else {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Return after the estimate-only fallback to prevent NaN overwriting valid estimates.

When simplex_iteration_limit < 1, the function populates pseudo-costs with estimates via initialize_pseudo_costs_with_estimate(). However, execution continues to the merge logic at lines 1173+. Since neither DS nor PDLP actually ran in this path, pdlp_obj_down/up remain NaN and dual_simplex_status_* remain UNSET.

At line 1192, when effective_batch_pdlp == 2:

if (source == sb_source_t::PDLP || effective_batch_pdlp == 2) { sb_dest = value; }

This unconditionally writes value (which is NaN from merge_sb_result returning {NaN, NONE}) back to pc.strong_branch_down/up, overwriting the valid estimates.

Suggested fix: add early return after estimate computation
   if (simplex_iteration_limit < 1) {
     initialize_pseudo_costs_with_estimate(original_lp,
                                           settings,
                                           root_vstatus,
                                           root_solution,
                                           basic_list,
                                           nonbasic_list,
                                           fractional,
                                           basis_factors,
                                           pc);
+    pc.update_pseudo_costs_from_strong_branching(fractional, root_solution.x);
+    settings.log.printf("Strong branching completed (estimate-only) in %.2fs\n",
+                        toc(strong_branching_start_time));
+    return;
   } else {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/branch_and_bound/pseudo_costs.cpp` around lines 1044 - 1054, When
simplex_iteration_limit < 1 the function calls
initialize_pseudo_costs_with_estimate(...) but then continues into the merge
logic which can overwrite valid estimates with NaN (pdlp_obj_down/up and
dual_simplex_status_* are unset); add an early return immediately after
initialize_pseudo_costs_with_estimate(...) to skip the subsequent
merge_sb_result/merge logic so pc.strong_branch_down and pc.strong_branch_up are
not overwritten (this prevents the branch that checks sb_source_t::PDLP or
effective_batch_pdlp == 2 from writing NaN back into the pseudo-costs).

Comment on lines +1543 to +1560
if (use_pdlp) {
#pragma omp task default(shared)
batch_pdlp_reliability_branching_task(log,
rb_mode,
num_candidates,
start_time,
concurrent_halt,
original_lp,
new_slacks,
leaf_solution.x,
worker,
candidate_vars,
settings,
sb_view,
pdlp_warm_cache,
pdlp_obj_down,
pdlp_obj_up);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Find call sites of reliable_variable_selection to verify they are within parallel regions

# Search for calls to reliable_variable_selection
rg -nC5 'reliable_variable_selection\s*\(' --type=cpp -g '!*pseudo_costs.cpp'

Repository: NVIDIA/cuopt

Length of output: 2273


🏁 Script executed:

#!/bin/bash
# Get broader context around line 841 in branch_and_bound.cpp
sed -n '800,900p' cpp/src/branch_and_bound/branch_and_bound.cpp | cat -n

Repository: NVIDIA/cuopt

Length of output: 5140


🏁 Script executed:

#!/bin/bash
# Search for calls to variable_selection method
rg -n '\.variable_selection\s*\(' cpp/src/branch_and_bound/branch_and_bound.cpp | head -20

Repository: NVIDIA/cuopt

Length of output: 361


🏁 Script executed:

#!/bin/bash
# Get context around line 942 to identify the function and parallel regions
sed -n '920,960p' cpp/src/branch_and_bound/branch_and_bound.cpp | cat -n

Repository: NVIDIA/cuopt

Length of output: 1885


🏁 Script executed:

#!/bin/bash
# Search for update_tree_impl to see where select_branch_variable is called
rg -n 'update_tree_impl\|select_branch_variable' cpp/src/branch_and_bound/branch_and_bound.cpp | head -30

Repository: NVIDIA/cuopt

Length of output: 38


🏁 Script executed:

#!/bin/bash
# Search for OpenMP pragmas in branch_and_bound.cpp to find parallel regions
rg -n '#pragma omp' cpp/src/branch_and_bound/branch_and_bound.cpp

Repository: NVIDIA/cuopt

Length of output: 276


🏁 Script executed:

#!/bin/bash
# Get broader context around the parallel regions to understand the overall structure
sed -n '2600,2650p' cpp/src/branch_and_bound/branch_and_bound.cpp | cat -n

Repository: NVIDIA/cuopt

Length of output: 2355


🏁 Script executed:

#!/bin/bash
# Search for run_scheduler and where worker threads execute
rg -n 'run_scheduler\|worker.*loop\|process_node' cpp/src/branch_and_bound/branch_and_bound.cpp | head -20

Repository: NVIDIA/cuopt

Length of output: 38


🏁 Script executed:

#!/bin/bash
# Look at the reliable_variable_selection function and surrounding context to understand where it's called
sed -n '1520,1580p' cpp/src/branch_and_bound/pseudo_costs.cpp | cat -n

Repository: NVIDIA/cuopt

Length of output: 3149


🏁 Script executed:

#!/bin/bash
# Search for all calls to pc_.reliable_variable_selection (the direct calls)
rg -n 'pc_\.reliable_variable_selection\|\.reliable_variable_selection' cpp/src/branch_and_bound/ --type=cpp

Repository: NVIDIA/cuopt

Length of output: 38


🏁 Script executed:

#!/bin/bash
# Search for all references to reliable_variable_selection
rg -n 'reliable_variable_selection' cpp/src/branch_and_bound/ --type=cpp

Repository: NVIDIA/cuopt

Length of output: 392


🏁 Script executed:

#!/bin/bash
# Find and examine single_threaded_solve
rg -n 'single_threaded_solve' cpp/src/branch_and_bound/branch_and_bound.cpp -A 30 | head -60

Repository: NVIDIA/cuopt

Length of output: 2958


🏁 Script executed:

#!/bin/bash
# Get the full single_threaded_solve function
sed -n '1797,1900p' cpp/src/branch_and_bound/branch_and_bound.cpp | cat -n

Repository: NVIDIA/cuopt

Length of output: 5223


🏁 Script executed:

#!/bin/bash
# Search for plunge_with function
rg -n 'plunge_with' cpp/src/branch_and_bound/ --type=cpp -A 5 | head -40

Repository: NVIDIA/cuopt

Length of output: 2157


🏁 Script executed:

#!/bin/bash
# Get the implementation of plunge_with
sed -n '1430,1550p' cpp/src/branch_and_bound/branch_and_bound.cpp | cat -n

Repository: NVIDIA/cuopt

Length of output: 5434


🏁 Script executed:

#!/bin/bash
# Search for update_tree function and what it calls
rg -n 'update_tree\(' cpp/src/branch_and_bound/branch_and_bound.cpp | head -5

Repository: NVIDIA/cuopt

Length of output: 331


🏁 Script executed:

#!/bin/bash
# Get the update_tree function implementation
sed -n '1282,1400p' cpp/src/branch_and_bound/branch_and_bound.cpp | cat -n

Repository: NVIDIA/cuopt

Length of output: 5990


Fix OpenMP task directives in reliable_variable_selection to work in both parallel and non-parallel contexts.

The #pragma omp task and #pragma omp taskloop at lines 1544 and 1578+ require an active parallel region to execute as intended. However, in single-threaded mode, single_threaded_solve() (line 2622 in branch_and_bound.cpp) calls plunge_with()update_tree()variable_selection()reliable_variable_selection() outside any parallel region, causing the task/taskloop pragmas to either execute sequentially on the calling thread or behave incorrectly.

Either guard these pragmas with if (omp_in_parallel()) checks, restructure the call site to ensure it always runs within a parallel region, or conditionally compile different code paths for single vs. multi-threaded execution.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/branch_and_bound/pseudo_costs.cpp` around lines 1543 - 1560, The
OpenMP task/taskloop pragmas in reliable_variable_selection are being emitted
even when not inside a parallel region; wrap the omp task and omp taskloop usage
with runtime guards (e.g. check omp_in_parallel()) or provide an alternative
sequential path so the code behaves correctly in single-threaded runs.
Specifically, before emitting the "#pragma omp task" that calls
batch_pdlp_reliability_branching_task and before the "#pragma omp taskloop"
block in reliable_variable_selection, detect omp_in_parallel() and only use the
task/taskloop when true; otherwise call batch_pdlp_reliability_branching_task
(and the taskloop body) directly in a loop on the calling thread to preserve
semantics. Ensure checks are adjacent to the existing pragmas and reference
reliable_variable_selection, batch_pdlp_reliability_branching_task, and the
taskloop block so the change is obvious and localized.

@Kh4ster
Copy link
Copy Markdown
Contributor Author

Kh4ster commented Apr 3, 2026

/merge

@rapids-bot rapids-bot bot merged commit a290700 into release/26.04 Apr 3, 2026
106 checks passed
@Kh4ster Kh4ster deleted the race_strong_branching_realibility_branching branch April 3, 2026 15:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality mip non-breaking Introduces a non-breaking change P1 pdlp

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants